Context-Adaptive-based Image Captioning by Bi-CARU
نویسندگان
چکیده
Image captions are abstract expressions of content representations using text sentences, helping readers to better understand and analyse information between different media. With the advantage encoder-decoder neural networks, can provide a rational structure for tasks such as image coding caption prediction. This work introduces Convolutional Neural Network (CNN) Bidirectional Content-Adaptive Recurrent Unit (Bi-CARU) (CNN-to-Bi-CARU) model that performs bidirectional consider contextual features captures major feature from image. The encoded coded form is respectively passed into forward backward layer CARU refine word prediction, providing output captioning. An attention also introduced collect produced by context-adaptive gate in CARU, aiming compute weighting relationship extraction determination. In experiments, proposed CNN-to-Bi-CARU outperforms other advanced models field, achieving detailed representation captions. obtains score 41.28 on BLEU@4, 31.23 METEOR, 61.07 ROUGE-L, 133.20 CIDEr-D, making it competitive captioning MSCOCO dataset.
منابع مشابه
Phrase-based Image Captioning
Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representat...
متن کاملImproving Image Captioning by Concept-Based Sentence Reranking
This paper describes our winning entry in the ImageCLEF 2015 image sentence generation task. We improve Google’s CNN-LSTM model by introducing concept-based sentence reranking, a data-driven approach which exploits the large amounts of concept-level annotations on Flickr. Different from previous usage of concept detection that is tailored to specific image captioning models, the propose approac...
متن کاملContext-based adaptive image resolution upconversion
bstract. We propose a practical context-based adaptive image esolution upconversion algorithm. The basic idea is to use a lowesolution (LR) image patch as a context in which the missing highesolution (HR) pixels are estimated. The context is quantized into lasses and for each class an adaptive linear filter is designed using training set. The training set incorporates the prior knowledge of he ...
متن کاملUnpaired Image Captioning by Language Pivoting
Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this ...
متن کاملDomain-Specific Image Captioning
We present a data-driven framework for image caption generation which incorporates visual and textual features with varying degrees of spatial structure. We propose the task of domain-specific image captioning, where many relevant visual details cannot be captured by off-the-shelf general-domain entity detectors. We extract previously-written descriptions from a database and adapt them to new q...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3302512